Skip to content

Move the value assignment of vector x in gemv_n_sve.c to the outermos… #5420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

yuanjia111
Copy link

Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval.
1.Verify correctness using BLAS-Tester, as follows:
./xsl2blastst
------------------------------- GEMV --------------------------------
TST# TR M N ALPHA LDA INCX BETA INCY TIME MFLOP SpUp TEST
==== == ==== ==== ===== ==== ==== ===== ==== ====== ===== ===== =====
0 N 100 100 1.0 1000 1 1.0 1 0.00 7025.5 1.00 -----
0 N 100 100 1.0 1000 1 1.0 1 0.00 2479.6 0.35 PASS
1 N 200 200 1.0 1000 1 1.0 1 0.00 8852.2 1.00 -----
1 N 200 200 1.0 1000 1 1.0 1 0.00 7312.7 0.83 PASS
2 N 300 300 1.0 1000 1 1.0 1 0.00 8593.6 1.00 -----
2 N 300 300 1.0 1000 1 1.0 1 0.00 3601.1 0.42 PASS
3 N 400 400 1.0 1000 1 1.0 1 0.00 8670.0 1.00 -----
3 N 400 400 1.0 1000 1 1.0 1 0.00 11892.5 1.37 PASS
4 N 500 500 1.0 1000 1 1.0 1 0.00 10044.3 1.00 -----
4 N 500 500 1.0 1000 1 1.0 1 0.00 13902.3 1.38 PASS
5 N 600 600 1.0 1000 1 1.0 1 0.00 9877.2 1.00 -----
5 N 600 600 1.0 1000 1 1.0 1 0.00 14461.3 1.46 PASS
6 N 700 700 1.0 1000 1 1.0 1 0.00 10309.2 1.00 -----
6 N 700 700 1.0 1000 1 1.0 1 0.00 10684.0 1.04 PASS
7 N 800 800 1.0 1000 1 1.0 1 0.00 10330.9 1.00 -----
7 N 800 800 1.0 1000 1 1.0 1 0.00 13739.3 1.33 PASS
8 N 900 900 1.0 1000 1 1.0 1 0.00 11108.7 1.00 -----
8 N 900 900 1.0 1000 1 1.0 1 0.00 12660.2 1.14 PASS
9 N 1000 1000 1.0 1000 1 1.0 1 0.00 11904.7 1.00 -----
9 N 1000 1000 1.0 1000 1 1.0 1 0.00 15629.1 1.31 PASS
10 tests run, 10 passed
./xdl2blastst
------------------------------- GEMV --------------------------------
TST# TR M N ALPHA LDA INCX BETA INCY TIME MFLOP SpUp TEST
==== == ==== ==== ===== ==== ==== ===== ==== ====== ===== ===== =====
0 N 100 100 1.0 1000 1 1.0 1 0.00 4959.1 1.00 -----
0 N 100 100 1.0 1000 1 1.0 1 0.00 1453.5 0.29 PASS
1 N 200 200 1.0 1000 1 1.0 1 0.00 4946.8 1.00 -----
1 N 200 200 1.0 1000 1 1.0 1 0.00 2587.6 0.52 PASS
2 N 300 300 1.0 1000 1 1.0 1 0.00 5179.7 1.00 -----
2 N 300 300 1.0 1000 1 1.0 1 0.00 7271.5 1.40 PASS
3 N 400 400 1.0 1000 1 1.0 1 0.00 5622.8 1.00 -----
3 N 400 400 1.0 1000 1 1.0 1 0.00 7424.6 1.32 PASS
4 N 500 500 1.0 1000 1 1.0 1 0.00 5673.6 1.00 -----
4 N 500 500 1.0 1000 1 1.0 1 0.00 7578.5 1.34 PASS
5 N 600 600 1.0 1000 1 1.0 1 0.00 5961.4 1.00 -----
5 N 600 600 1.0 1000 1 1.0 1 0.00 7932.8 1.33 PASS
6 N 700 700 1.0 1000 1 1.0 1 0.00 6213.5 1.00 -----
6 N 700 700 1.0 1000 1 1.0 1 0.00 9348.5 1.50 PASS
7 N 800 800 1.0 1000 1 1.0 1 0.00 6160.6 1.00 -----
7 N 800 800 1.0 1000 1 1.0 1 0.00 10252.0 1.66 PASS
8 N 900 900 1.0 1000 1 1.0 1 0.00 6751.3 1.00 -----
8 N 900 900 1.0 1000 1 1.0 1 0.00 10656.0 1.58 PASS
9 N 1000 1000 1.0 1000 1 1.0 1 0.00 7910.3 1.00 -----
9 N 1000 1000 1.0 1000 1 1.0 1 0.00 10597.0 1.34 PASS
10 tests run, 10 passed
2.Using the built-in benchmark to verify performance, the performance of float and doule type improved by about 60% and about 40% respectively.
before optimization:
[root@localhost benchmark]# export OMP_NUM_THREADS=1;numactl -C 10 -l ./sgemv.goto 3000 4000 100
From : 3000 To : 4000 Step = 100 Trans = 'N' Inc_x = 1 Inc_y = 1 Loops = 1
SIZE Flops
3000x3000 : 11932.54 MFlops 0.001508 sec
3100x3100 : 11471.23 MFlops 0.001675 sec
3200x3200 : 11140.85 MFlops 0.001838 sec
3300x3300 : 11119.37 MFlops 0.001959 sec
3400x3400 : 11199.25 MFlops 0.002064 sec
3500x3500 : 11424.51 MFlops 0.002145 sec
3600x3600 : 11125.72 MFlops 0.002330 sec
3700x3700 : 11432.00 MFlops 0.002395 sec
3800x3800 : 11653.88 MFlops 0.002478 sec
3900x3900 : 11696.58 MFlops 0.002601 sec
4000x4000 : 11705.83 MFlops 0.002734 sec
[root@localhost benchmark]# export OMP_NUM_THREADS=1;numactl -C 10 -l ./dgemv.goto 3000 4000 100
From : 3000 To : 4000 Step = 100 Trans = 'N' Inc_x = 1 Inc_y = 1 Loops = 1
SIZE Flops
3000x3000 : 5260.93 MFlops 0.003421 sec
3100x3100 : 5490.46 MFlops 0.003501 sec
3200x3200 : 5318.63 MFlops 0.003851 sec
3300x3300 : 5284.31 MFlops 0.004122 sec
3400x3400 : 5243.10 MFlops 0.004410 sec
3500x3500 : 5317.14 MFlops 0.004608 sec
3600x3600 : 5004.25 MFlops 0.005180 sec
3700x3700 : 5351.32 MFlops 0.005116 sec
3800x3800 : 5221.78 MFlops 0.005531 sec
3900x3900 : 5224.54 MFlops 0.005823 sec
4000x4000 : 5194.21 MFlops 0.006161 sec
after optimization:
[root@localhost benchmark]# export OMP_NUM_THREADS=1;numactl -C 10 -l ./sgemv.goto 3000 4000 100
From : 3000 To : 4000 Step = 100 Trans = 'N' Inc_x = 1 Inc_y = 1 Loops = 1
SIZE Flops
3000x3000 : 17268.24 MFlops 0.001042 sec
3100x3100 : 19730.47 MFlops 0.000974 sec
3200x3200 : 16947.36 MFlops 0.001208 sec
3300x3300 : 18414.80 MFlops 0.001183 sec
3400x3400 : 18785.26 MFlops 0.001231 sec
3500x3500 : 18939.75 MFlops 0.001294 sec
3600x3600 : 17325.09 MFlops 0.001496 sec
3700x3700 : 18647.87 MFlops 0.001468 sec
3800x3800 : 18729.12 MFlops 0.001542 sec
3900x3900 : 19344.94 MFlops 0.001573 sec
4000x4000 : 18068.97 MFlops 0.001771 sec
[root@localhost benchmark]# export OMP_NUM_THREADS=1;numactl -C 10 -l ./dgemv.goto 3000 4000 100
From : 3000 To : 4000 Step = 100 Trans = 'N' Inc_x = 1 Inc_y = 1 Loops = 1
SIZE Flops
3000x3000 : 7592.27 MFlops 0.002371 sec
3100x3100 : 7880.05 MFlops 0.002439 sec
3200x3200 : 7531.85 MFlops 0.002719 sec
3300x3300 : 7511.61 MFlops 0.002900 sec
3400x3400 : 7332.10 MFlops 0.003153 sec
3500x3500 : 7235.68 MFlops 0.003386 sec
3600x3600 : 7010.80 MFlops 0.003697 sec
3700x3700 : 7107.42 MFlops 0.003852 sec
3800x3800 : 6901.65 MFlops 0.004185 sec
3900x3900 : 6898.33 MFlops 0.004410 sec
4000x4000 : 6809.35 MFlops 0.004699 s

…t loop to reduce the repeated data retrieval.

    1.Verify correctness using BLAS-Tester
    2.Using the built-in benchmark to verify performance, the performance of float and doule type improved by about 60% and about 40% respectively.The test command is:
     export OMP_NUM_THREADS=1;numactl -C 10 -l ./sgemv.goto 3000 4000 100
     export OMP_NUM_THREADS=1;numactl -C 10 -l ./dgemv.goto 3000 4000 100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant